{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# ⚖️ 创建一个合法偏好数据集\n",
    "\n",
    "_作者: [David Berenstein](https://huggingface.co/davidberenstein1957) 和 [Sara Han Díaz](https://huggingface.co/sdiazlor)_\n",
    "\n",
    "在本教程中，你将学习如何在 HF 推理端点上使用 Notus 模型，基于欧洲人工智能法案中的 RAG 指令创建一个合法的偏好数据集。这是一个完整的端到端示例，展示了如何使用 distilabel 来利用大型语言模型（LLMs）！\n",
    "\n",
    "[distilabel](https://github.com/argilla-io/distilabel)是一个人工智能反馈（AIF）框架，它可以使用 LLMs 生成和标注数据集，并且可以用于许多不同的用例。它以稳健性、效率和可扩展性为目标实现，允许任何人构建可用于多种场景的合成数据集。\n",
    "\n",
    "为了生成指令数据集，我们将使用与 distilabel 集成的[ HF 推理端点](https://huggingface.co/docs/inference-endpoints/en/index)。这些推理端点由 Hugging Face 提供，允许在专用和自动扩展的基础设施上轻松部署和运行 transformers、diffusers 或 Hub 中的任何可用模型。你可以在[这里](https://huggingface.co/docs/inference-endpoints/guides/create_endpoint)找到更多关于如何创建你的第一个端点的信息。\n",
    "\n",
    "我们将为此微调的 LLM 模型是 [Notus 7B](https://argilla.io/blog/notus7b/)，这是 Zephyr 7B 的一个微调版本，它使用直接偏好优化（DPO）和 AIF 技术，在多个基准测试中超越了其基础模型，并且完全开源。\n",
    "\n",
    "本教程包括以下步骤：\n",
    "\n",
    "- 为 `distilabel` 流水线定义一个自定义生成任务。\n",
    "- 使用 Haystack 为欧盟人工智能法案创建一个 RAG 流水线。\n",
    "- 使用 `SelfInstructTask` 生成一个指令数据集。\n",
    "- 使用 `UltraFeedback` 文本质量任务生成一个偏好数据集。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 简介\n",
    "让我们从安装运行 **distilabel** 以及教程中使用的其他包所需的依赖项开始；尤其是 **Haystack**。为了更好地可视化和管理结果，也请安装 **Argilla**。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install -q -U distilabel \"farm-haystack[preprocessing]\"\n",
    "!pip install -q -U \"distilabel[hf-inference-endpoints, argilla]\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 导入依赖项\n",
    "\n",
    "本教程的主要依赖项是 distilabel，用于创建合成数据集，以及 Argilla，用于可视化和管理这些数据集，同时也用于微调我们的模型。包 [Haystack](https://haystack.deepset.ai/) 用于从我们想要创建数据集的原始 PDF 文档中创建批次。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from typing import Dict\n",
    "\n",
    "from distilabel.llm import InferenceEndpointsLLM\n",
    "from distilabel.pipeline import Pipeline, pipeline\n",
    "from distilabel.tasks import TextGenerationTask, SelfInstructTask, Prompt\n",
    "\n",
    "from datasets import Dataset\n",
    "from haystack.nodes import PDFToTextConverter, PreProcessor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 环境变量\n",
    "\n",
    "我们需要提供我们的 HuggingFace 访问 token，可以从[设置](https://huggingface.co/settings/tokens)中获取。此外，为了通过 UltraFeedback 文本质量任务生成偏好数据集，我们还需要 OpenAI 的 api 密钥。你可以在[这里](https://platform.openai.com/api-keys)找到它。请注意，根据使用的模型不同，将收取不同的费用，因此请确保你检查了 OpenAI 的[定价页面](https://openai.com/pricing)。\n",
    "\n",
    "为了稍后实例化一个 `InferenceEndpointsLLM` 对象，我们还需要作为参数传递 HF 推理端点名称和 HF 命名空间。通过环境变量也是一种非常方便的方式。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"HF_TOKEN\"] = \"\"\n",
    "os.environ[\"HF_INFERENCE_ENDPOINT_NAME\"] = \"aws-notus-7b-v1-3184\"\n",
    "os.environ[\"HF_NAMESPACE\"] = \"argilla\"\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 Notus 设置推理端点\n",
    "推理端点是 Hugging Face 管理的一种解决方案，可以轻松部署任何类似 Transformer 的模型。它们是基于 Hugging Face Hub 上的模型构建的。推理端点对于在 LLMs 上进行推理非常方便，无需尝试在本地运行模型。在本教程中，我们将使用推理端点作为 `distilabel` 工作流程的一部分，使用我们的 Notus 模型生成文本。所选端点运行着一个[ Notus 7B 实例](https://ui.endpoints.huggingface.co/argilla/endpoints/aws-notus-7b-v1-4052)。\n",
    "\n",
    "### 为 distilabel 流水线定义一个自定义生成任务\n",
    "\n",
    "为了开始本教程，让我们看看如何为我们的 Notus 模型设置一个端点。这不是我们稍后将看到的端到端示例的一部分，但是如何连接到 Hugging Face 端点以及测试 `distilabel` 流水线的一个示例。\n",
    "\n",
    "让我们快速了解一下如何使用推理端点。我们已经准备了一个简单的 `TextGenerationTask` 来向模型提问，这种方式与我们使用聊天机器人与 LLMs 交流非常相似。首先，我们定义一个用于问答任务的类，其中包含的函数向 `distilabel` 展示了模型应该如何生成提示、解析输入和输出等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class QuestionAnsweringTask(TextGenerationTask):\n",
    "    def generate_prompt(self, question: str) -> str:\n",
    "        return Prompt(\n",
    "            system_prompt=self.system_prompt,\n",
    "            formatted_prompt=question,\n",
    "        ).format_as(\n",
    "            \"llama2\"\n",
    "        )  # type: ignore\n",
    "\n",
    "    def parse_output(self, output: str) -> Dict[str, str]:\n",
    "        return {\"answer\": output.strip()}\n",
    "\n",
    "    @property\n",
    "    def input_args_names(self) -> list[str]:\n",
    "        return [\"question\"]\n",
    "\n",
    "    @property\n",
    "    def output_args_names(self) -> list[str]:\n",
    "        return [\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`llm` 是 `InferenceEndpointsLLM` 类的一个对象，通过使用它，我们可以开始使用 `llm.generate()` 方法来生成问题的答案。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = InferenceEndpointsLLM(\n",
    "    endpoint_name_or_model_id=os.getenv(\"HF_INFERENCE_ENDPOINT_NAME\"),  # type: ignore\n",
    "    endpoint_namespace=os.getenv(\"HF_NAMESPACE\"),  # type: ignore\n",
    "    token=os.getenv(\"HF_TOKEN\") or None,\n",
    "    task=QuestionAnsweringTask(),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用定义了端点和任务信息的 `InferenceEndpointsLLM` 对象，我们可以开始生成文本。让我们问这个 LLM，例如，丹麦第二大城市人口最多的是哪个城市。答案应该是 Aarhus。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The second most populated city in Denmark is Aarhus, with a population of around 340,000 people. It is located on the east coast of Jutland, and is known for its vibrant cultural scene, beautiful beaches, and historic landmarks. Aarhus is also home to Aarhus University, one of the largest universities in Scandinavia.'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generation = llm.generate(\n",
    "    [{\"question\": \"What's the second most populated city in Denmark?\"}]\n",
    ")\n",
    "generation[0][0][\"parsed_output\"][\"answer\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "端点工作正常！我们已经成功地为 `distilabel` 流水线设置了一个自定义生成任务。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 Haystack 为欧洲人工智能法案创建 RAG 流水线\n",
    "\n",
    "在这个端到端的示例中，我们希望创建一个能够回答问题并填写关于欧盟推广的新人工智能法案信息的专家模型，这是关于人工智能的第一项法规。作为其数字战略的一部分，欧盟希望规范人工智能，以确保更好地发展和使用这项创新技术。这个法案是人工智能的监管框架，不同的风险级别意味着更多的或更少的监管。它们是世界上关于人工智能的第一套规则。\n",
    "\n",
    "我们想要创建的这个 RAG 的流水线会下载 PDF 文件，将其转换为纯文本并进行预处理，创建我们可以提供给 `distilabel` 的批次，以便开始从中创建指令。让我们看看流水线的第一部分并获取输入数据。需要注意的是，这个流水线的 RAG 部分并不是基于活跃的查询或语义属性的流水线，而是一种更直接的方法，我们下载PDF并预处理其内容。\n",
    "\n",
    "### 下载人工智能法案 PDF\n",
    "\n",
    "首先，我们需要下载 PDF 文档本身。如果它不在工作目录中，我们将把它放在那里。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "if [ ! -f \"The-AI-Act.pdf\" ]; then\n",
    "    wget -q https://artificialintelligenceact.eu/wp-content/uploads/2021/08/The-AI-Act.pdf\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一旦我们将文件放入工作目录，我们可以使用 Haystack 的转换器和流水线功能来提取文本数据，清洗数据并将其分成不同的批次。之后，这些批次将被用来开始创建合成指令。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The converter turns the PDF into text we can process easily\n",
    "converter = PDFToTextConverter(remove_numeric_tables=True, valid_languages=[\"en\"])\n",
    "\n",
    "# Preprocessing pipelines can have several steps.\n",
    "# Ours clean empty lines, header, footers and whitespaces\n",
    "# and split the text into 150-char long batches, respecting\n",
    "# where the sentences naturally end and begin.\n",
    "preprocessor = PreProcessor(\n",
    "    clean_empty_lines=True,\n",
    "    clean_whitespace=True,\n",
    "    clean_header_footer=True,\n",
    "    split_by=\"word\",\n",
    "    split_length=150,\n",
    "    split_respect_sentence_boundary=True,\n",
    ")\n",
    "\n",
    "doc = converter.convert(file_path=\"The-AI-Act.pdf\", meta=None)[0]\n",
    "docs = preprocessor.process([doc])\n",
    "print(f\"Documents: 1\\nBatches: {len(docs)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们快速查看一下我们刚刚生成的批次。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'EN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Int'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inputs = [doc.content for doc in docs]\n",
    "inputs[0][0:500]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "文件已经被正确地分批处理，从一个大文档变成了最多 150 个字符长的 355 个字符串。现在这个字符串列表可以作为输入，使用 `distilabel` 生成指令数据集。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 SelfInstructTask 生成指令\n",
    "\n",
    "由于我们的推理端点已经启动并运行，我们应该能够使用 distilabel 生成指令。通过我们的端点由 LLM 创建的这些指令，将形成一个指令数据集，其中的指令是由我们刚刚提取的数据创建的。\n",
    "\n",
    "为了示例的顺利进行，我们使用了上面生成的 50 个批次的一个子集，以减轻性能压力。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['input'],\n",
       "    num_rows: 50\n",
       "})"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instructions_dataset = Dataset.from_dict({\"input\": inputs[0:50]})\n",
    "\n",
    "instructions_dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `SelfInstructTask` 类，我们可以为构建提示生成一个 Self-Instruct 规范，就像在 [Self-Instruct 论文](https://arxiv.org/abs/2212.10560)中所做的那样。`distilabel` 将从人工制作的输入开始，在这个案例中，就是我们从 AI 法案 PDF 创建的批次，然后基于这些输入生成指令。之后，可以使用 Argilla 来审查这些指令，以保留最好的那些。\n",
    "\n",
    "我们可以通过传递一个应用描述作为参数来告诉模型它应该做什么；我们希望这个模型能够回答我们关于 AI 法案的任何问题。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions_task = SelfInstructTask(\n",
    "    application_description=\"A assistant that can answer questions about the AI Act made by the European Union.\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，我们来定义一个生成器，传入 `SelfInstructTask` 对象，并创建一个 `Pipeline` 对象。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions_generator = InferenceEndpointsLLM(\n",
    "    endpoint_name_or_model_id=os.getenv(\"HF_INFERENCE_ENDPOINT_NAME\"),  # type: ignore\n",
    "    endpoint_namespace=os.getenv(\"HF_NAMESPACE\"),  # type: ignore\n",
    "    token=os.getenv(\"HF_TOKEN\") or None,\n",
    "    task=instructions_task,\n",
    ")\n",
    "\n",
    "instructions_pipeline = Pipeline(generator=instructions_generator)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们的流水线已经准备好用来生成指令了。下面就开始吧！\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "generated_instructions = instructions_pipeline.generate(\n",
    "    dataset=instructions_dataset, num_generations=1, batch_size=8\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "流水线已经成功生成了指令，基于输入的主题和行为。让我们收集所有这些指令，看看它们是什么样的。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of generated instructions: 178\n",
      "What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?\n",
      "How can artificial intelligence improve prediction, optimise operations and resource allocation, and personalise service delivery?\n",
      "What benefits can artificial intelligence bring to the European economy and society as a whole?\n",
      "How can the use of artificial intelligence support socially and environmentally beneficial outcomes?\n",
      "What are the high-impact sectors that require AI action according to the AI Act by the European Union?\n"
     ]
    }
   ],
   "source": [
    "instructions = []\n",
    "for generations in generated_instructions[\"instructions\"]:\n",
    "    for generation in generations:\n",
    "        instructions.extend(generation)\n",
    "\n",
    "print(f\"Number of generated instructions: {len(instructions)}\")\n",
    "\n",
    "for instruction in instructions[:5]:\n",
    "    print(instruction)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "这些初始指令构成了我们的指令数据集。遵循人机协同的方法，我们应该将指令推送到 Argilla 进行可视化，并能够根据质量对它们进行排序。这些注释对于制作高质量的数据至关重要，确保最终模型有更好的性能。然而，这一步是可选的。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 将指令数据集推送到Argilla以进行可视化和注释。\n",
    "\n",
    "让我们快速查看一下由 `SelfInstructTask` 生成的指令。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input': 'EN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence\\n(AI) is a fast evolving family of technologies that can bring a wide array of economic and\\nsocietal benefits across the entire spectrum of industries and social activities. By improving\\nprediction, optimising operations and resource allocation, and personalising service delivery,\\nthe use of artificial intelligence can support socially and environmentally beneficial outcomes\\nand provide key competitive advantages to companies and the European economy. ',\n",
       " 'generation_model': ['argilla/notus-7b-v1'],\n",
       " 'generation_prompt': ['You are an expert prompt writer, writing the best and most diverse prompts for a variety of tasks. You are given a task description and a set of instructions for how to write the prompts for an specific AI application.\\n# Task Description\\nDevelop 5 user queries that can be received by the given AI application and applicable to the provided context. Emphasize diversity in verbs and linguistic structures within the model\\'s textual capabilities.\\n\\n# Criteria for Queries\\nIncorporate a diverse range of verbs, avoiding repetition.\\nEnsure queries are compatible with AI model\\'s text generation functions and are limited to 1-2 sentences.\\nDesign queries to be self-contained and standalone.\\nBlend interrogative (e.g., \"What is the significance of x?\") and imperative (e.g., \"Detail the process of x.\") styles.\\nWrite each query on a separate line and avoid using numbered lists or bullet points.\\n\\n# AI Application\\nA assistant that can answer questions about the AI Act made by the European Union.\\n\\n# Context\\nEN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence\\n(AI) is a fast evolving family of technologies that can bring a wide array of economic and\\nsocietal benefits across the entire spectrum of industries and social activities. By improving\\nprediction, optimising operations and resource allocation, and personalising service delivery,\\nthe use of artificial intelligence can support socially and environmentally beneficial outcomes\\nand provide key competitive advantages to companies and the European economy. \\n\\n# Output\\n'],\n",
       " 'raw_generation_responses': ['1. What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?\\n2. How can artificial intelligence improve prediction, optimise operations and resource allocation, and personalise service delivery?\\n3. What benefits can artificial intelligence bring to the European economy and society as a whole?\\n4. How can the use of artificial intelligence support socially and environmentally beneficial outcomes?\\n5. What competitive advantages can companies gain from using artificial intelligence?'],\n",
       " 'instructions': [['What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?',\n",
       "   'How can artificial intelligence improve prediction, optimise operations and resource allocation, and personalise service delivery?',\n",
       "   'What benefits can artificial intelligence bring to the European economy and society as a whole?',\n",
       "   'How can the use of artificial intelligence support socially and environmentally beneficial outcomes?']]}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "generated_instructions[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对于每个输入，即 AI 法案 PDF 文件的每个批次，我们都有一个生成器提示，其中包含了关于如何行动的通用指南，以及应用程序描述参数。每个输入已经生成了 4 条指令。\n",
    "\n",
    "现在正好是将指令数据集上传到 Argilla，审查并手动注释它的最佳时机。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "FeedbackRecord(fields={'input': 'EN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence\\n(AI) is a fast evolving family of technologies that can bring a wide array of economic and\\nsocietal benefits across the entire spectrum of industries and social activities. By improving\\nprediction, optimising operations and resource allocation, and personalising service delivery,\\nthe use of artificial intelligence can support socially and environmentally beneficial outcomes\\nand provide key competitive advantages to companies and the European economy.', 'instruction': 'What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?'}, metadata={'length-input': 964, 'length-instruction': 129, 'generation-model': 'argilla/notus-7b-v1'}, vectors={}, responses=[], suggestions=(), external_id=None)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instructions_rg_dataset = generated_instructions.to_argilla()\n",
    "instructions_rg_dataset[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions_rg_dataset.push_to_argilla(name=f\"notus_AI_instructions\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 Argilla 的用户界面中，每个输入-指令元组都会单独显示，并且可以单独进行注释。\n",
    "\n",
    "![Instruction dataset](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/instrucion_dataset_notus_ui.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 Ultrafeedback 文本质量任务生成偏好数据集\n",
    "\n",
    "一旦我们有了指令数据集，我们将会通过 UltraFeedback 文本质量任务创建一个偏好数据集。这是一种在自然语言处理中用于评估生成文本质量的任务类型；我们的目标是提供关于生成文本质量的详细反馈，而不仅仅是二元的标签。\n",
    "我们的 `pipeline()` 方法允许我们为给定的任务创建一个带有提供的 LLMs 的 `Pipeline` 实例，这在你想要为给定任务使用预定义或自定义的 `Pipeline` 时非常有用。我们将指定我们的任务和子任务，我们想要使用的生成器（在这个案例中，是基于文本生成任务的生成器）以及我们的 OpenAI API 密钥。\n",
    "\n",
    "> 请注意，不使用 OpenAI 模型来获取此反馈也是可能的。然而，性能将会受到影响，反馈的质量也会较低。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "preference_pipeline = pipeline(\n",
    "    \"preference\",\n",
    "    \"instruction-following\",\n",
    "    generator=InferenceEndpointsLLM(\n",
    "        endpoint_name_or_model_id=os.getenv(\"HF_INFERENCE_ENDPOINT_NAME\"),  # type: ignore\n",
    "        endpoint_namespace=os.getenv(\"HF_NAMESPACE\", None),\n",
    "        task=TextGenerationTask(),\n",
    "        max_new_tokens=256,\n",
    "        num_threads=2,\n",
    "        temperature=0.3,\n",
    "    ),\n",
    "    max_new_tokens=256,\n",
    "    num_threads=2,\n",
    "    api_key=os.getenv(\"OPENAI_API_KEY\", None),\n",
    "    temperature=0.0,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还需要从 Argilla 检索我们的指令数据集，因为它将是这个流水线的输入。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Dataset({\n",
       "    features: ['input', 'instruction', 'instruction-rating', 'instruction-rating-suggestion', 'instruction-rating-suggestion-metadata', 'external_id', 'metadata'],\n",
       "    num_rows: 100\n",
       "})"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "remote_dataset = rg.FeedbackDataset.from_argilla(\n",
    "    \"notus_AI_instructions\", workspace=\"admin\"\n",
    ")\n",
    "instructions_dataset = remote_dataset.pull(max_records=100)  # get first 100 records\n",
    "\n",
    "instructions_dataset = instructions_dataset.format_as(\"datasets\")\n",
    "instructions_dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input': 'EN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence\\n(AI) is a fast evolving family of technologies that can bring a wide array of economic and\\nsocietal benefits across the entire spectrum of industries and social activities. By improving\\nprediction, optimising operations and resource allocation, and personalising service delivery,\\nthe use of artificial intelligence can support socially and environmentally beneficial outcomes\\nand provide key competitive advantages to companies and the European economy.',\n",
       " 'instruction': 'What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?',\n",
       " 'instruction-rating': [],\n",
       " 'instruction-rating-suggestion': None,\n",
       " 'instruction-rating-suggestion-metadata': {'type': None,\n",
       "  'score': None,\n",
       "  'agent': None},\n",
       " 'external_id': None,\n",
       " 'metadata': '{\"length-input\": 964, \"length-instruction\": 129, \"generation-model\": \"argilla/notus-7b-v1\"}'}"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instructions_dataset[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在根据我们的指令生成文本之前，我们需要重命名数据集中的某些列。从前面的部分，我们仍然有旧的输入，即来自 PDF 的批次。我们需要将它们改为我们生成的指令。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "instructions_dataset = instructions_dataset.rename_columns({\"input\": \"context\", \"instruction\": \"input\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，让我们使用刚刚创建的流水线以及生成我们指令的主题来构建一个数据集。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "preference_dataset = preference_pipeline.generate(\n",
    "    instructions_dataset,  # type: ignore\n",
    "    num_generations=2,\n",
    "    batch_size=8,\n",
    "    display_progress_bar=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们来看一下偏好数据集的一个实例："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'context': 'EN EN\\nEUROPEAN\\nCOMMISSION\\nProposal for a\\nREGULATION OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL\\nLAYING DOWN HARMONISED RULES ON ARTIFICIAL INTELLIGENCE\\n(ARTIFICIAL INTELLIGENCE ACT) AND AMENDING CERTAIN UNION\\nLEGISLATIVE ACTS\\x0cEN\\nEXPLANATORY MEMORANDUM\\n1. CONTEXT OF THE PROPOSAL\\n1.1. Reasons for and objectives of the proposal\\nThis explanatory memorandum accompanies the proposal for a Regulation laying down\\nharmonised rules on artificial intelligence (Artificial Intelligence Act). Artificial Intelligence\\n(AI) is a fast evolving family of technologies that can bring a wide array of economic and\\nsocietal benefits across the entire spectrum of industries and social activities. By improving\\nprediction, optimising operations and resource allocation, and personalising service delivery,\\nthe use of artificial intelligence can support socially and environmentally beneficial outcomes\\nand provide key competitive advantages to companies and the European economy.',\n",
       " 'input': 'What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?',\n",
       " 'instruction-rating': [],\n",
       " 'instruction-rating-suggestion': None,\n",
       " 'instruction-rating-suggestion-metadata': {'agent': None,\n",
       "  'score': None,\n",
       "  'type': None},\n",
       " 'external_id': None,\n",
       " 'metadata': '{\"length-input\": 964, \"length-instruction\": 129, \"generation-model\": \"argilla/notus-7b-v1\"}',\n",
       " 'generation_model': ['argilla/notus-7b-v1', 'argilla/notus-7b-v1'],\n",
       " 'generation_prompt': [\"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\\nWhat are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?\",\n",
       "  \"You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.\\nIf a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\\nWhat are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?\"],\n",
       " 'raw_generation_responses': [\"\\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure the trustworthy use of AI in the EU. It seeks to create a single market for AI applications and services, while ensuring that they are safe and respect fundamental rights. The proposal is part of the EU's broader strategy on AI, which aims to put the EU at the forefront of global AI development and deployment.\\nThe objectives of the proposal are to:\\n\\n1. Ensure that AI systems are designed, developed, and deployed in a way that respects fundamental rights and values, including human dignity, freedom, and privacy.\\n2. Ensure that AI systems are safe and secure, and do not pose unacceptable risks to people, property, or the environment.\\n3. Ensure that AI systems are robust, reliable, and accurate, and can be trusted to deliver the intended functionality.\\n4. Ensure that AI systems are traceable, meaning that it is possible to track how they work and how they make decisions.\\n5. Ensure that AI systems are transparent, meaning that it is possible to understand how they work and how they make decisions.\\n6. Ensure that AI systems are fair, meaning that they do not discriminate against individuals\",\n",
       "  '\\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure a high level of safety and security of AI systems and to establish a horizontal and technology-neutral framework for AI applications. This will help to create a single market for AI and to ensure that AI systems are developed and deployed in a responsible manner. The proposal will also help to strengthen the competitiveness of the EU industry in the global AI market.\\nThe objectives of the proposal are:\\n1. To ensure that AI systems are safe and secure by establishing a risk-based framework for the development, placement on the market and use of AI systems.\\n2. To establish a horizontal and technology-neutral framework for AI applications that is applicable to all sectors and types of AI systems.\\n3. To ensure that AI systems are developed and deployed in a responsible manner by establishing requirements for transparency, robustness, security, accuracy, controllability and privacy protection.\\n4. To create a single market for AI by ensuring that AI systems are developed and deployed in a harmonised manner across the EU.\\n5. To strengthen the competitiveness of the EU industry in the global AI market by creating a level playing field for businesses and by promoting the'],\n",
       " 'generations': [\"\\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure the trustworthy use of AI in the EU. It seeks to create a single market for AI applications and services, while ensuring that they are safe and respect fundamental rights. The proposal is part of the EU's broader strategy on AI, which aims to put the EU at the forefront of global AI development and deployment.\\nThe objectives of the proposal are to:\\n\\n1. Ensure that AI systems are designed, developed, and deployed in a way that respects fundamental rights and values, including human dignity, freedom, and privacy.\\n2. Ensure that AI systems are safe and secure, and do not pose unacceptable risks to people, property, or the environment.\\n3. Ensure that AI systems are robust, reliable, and accurate, and can be trusted to deliver the intended functionality.\\n4. Ensure that AI systems are traceable, meaning that it is possible to track how they work and how they make decisions.\\n5. Ensure that AI systems are transparent, meaning that it is possible to understand how they work and how they make decisions.\\n6. Ensure that AI systems are fair, meaning that they do not discriminate against individuals\",\n",
       "  '\\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure a high level of safety and security of AI systems and to establish a horizontal and technology-neutral framework for AI applications. This will help to create a single market for AI and to ensure that AI systems are developed and deployed in a responsible manner. The proposal will also help to strengthen the competitiveness of the EU industry in the global AI market.\\nThe objectives of the proposal are:\\n1. To ensure that AI systems are safe and secure by establishing a risk-based framework for the development, placement on the market and use of AI systems.\\n2. To establish a horizontal and technology-neutral framework for AI applications that is applicable to all sectors and types of AI systems.\\n3. To ensure that AI systems are developed and deployed in a responsible manner by establishing requirements for transparency, robustness, security, accuracy, controllability and privacy protection.\\n4. To create a single market for AI by ensuring that AI systems are developed and deployed in a harmonised manner across the EU.\\n5. To strengthen the competitiveness of the EU industry in the global AI market by creating a level playing field for businesses and by promoting the'],\n",
       " 'labelling_model': 'gpt-3.5-turbo',\n",
       " 'labelling_prompt': [{'content': 'Your role is to evaluate text quality based on given criteria.',\n",
       "   'role': 'system'},\n",
       "  {'content': \"\\n# Instruction Following Assessment\\nEvaluate alignment between output and intent. Assess understanding of task goal and restrictions.\\n**Instruction Components**: Task Goal (intended outcome), Restrictions (text styles, formats, or designated methods, etc).\\n\\n**Scoring**: Rate outputs 1 to 5:\\n\\n1. **Irrelevant**: No alignment.\\n2. **Partial Focus**: Addresses one aspect poorly.\\n3. **Partial Compliance**:\\n\\t- (1) Meets goal or restrictions, neglecting other.\\n\\t- (2) Acknowledges both but slight deviations.\\n4. **Almost There**: Near alignment, minor deviations.\\n5. **Comprehensive Compliance**: Fully aligns, meets all requirements.\\n\\n---\\n\\n## Format\\n\\n### Input\\nInstruction: [Specify task goal and restrictions]\\n\\nTexts:\\n\\n<text 1> [Text 1]\\n<text 2> [Text 2]\\n\\n### Output\\n\\n#### Output for Text 1\\nRating: [Rating for text 1]\\nRationale: [Rationale for the rating in short sentences]\\n\\n#### Output for Text 2\\nRating: [Rating for text 2]\\nRationale: [Rationale for the rating in short sentences]\\n\\n---\\n\\n## Annotation\\n\\n### Input\\nInstruction: What are the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence?\\n\\nTexts:\\n\\n<text 1> \\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure the trustworthy use of AI in the EU. It seeks to create a single market for AI applications and services, while ensuring that they are safe and respect fundamental rights. The proposal is part of the EU's broader strategy on AI, which aims to put the EU at the forefront of global AI development and deployment.\\nThe objectives of the proposal are to:\\n\\n1. Ensure that AI systems are designed, developed, and deployed in a way that respects fundamental rights and values, including human dignity, freedom, and privacy.\\n2. Ensure that AI systems are safe and secure, and do not pose unacceptable risks to people, property, or the environment.\\n3. Ensure that AI systems are robust, reliable, and accurate, and can be trusted to deliver the intended functionality.\\n4. Ensure that AI systems are traceable, meaning that it is possible to track how they work and how they make decisions.\\n5. Ensure that AI systems are transparent, meaning that it is possible to understand how they work and how they make decisions.\\n6. Ensure that AI systems are fair, meaning that they do not discriminate against individuals\\n<text 2> \\nThe proposal for a Regulation laying down harmonised rules on artificial intelligence (AI) aims to ensure a high level of safety and security of AI systems and to establish a horizontal and technology-neutral framework for AI applications. This will help to create a single market for AI and to ensure that AI systems are developed and deployed in a responsible manner. The proposal will also help to strengthen the competitiveness of the EU industry in the global AI market.\\nThe objectives of the proposal are:\\n1. To ensure that AI systems are safe and secure by establishing a risk-based framework for the development, placement on the market and use of AI systems.\\n2. To establish a horizontal and technology-neutral framework for AI applications that is applicable to all sectors and types of AI systems.\\n3. To ensure that AI systems are developed and deployed in a responsible manner by establishing requirements for transparency, robustness, security, accuracy, controllability and privacy protection.\\n4. To create a single market for AI by ensuring that AI systems are developed and deployed in a harmonised manner across the EU.\\n5. To strengthen the competitiveness of the EU industry in the global AI market by creating a level playing field for businesses and by promoting the\\n\\n### Output \",\n",
       "   'role': 'user'}],\n",
       " 'raw_labelling_response': '#### Output for Text 1\\nRating: 5\\nRationale: The text fully aligns with the task goal and restrictions. It clearly states the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence, including ensuring the trustworthy use of AI, creating a single market for AI applications and services, and ensuring safety, respect for fundamental rights, robustness, transparency, and fairness of AI systems.\\n\\n#### Output for Text 2\\nRating: 4\\nRationale: The text mostly aligns with the task goal and restrictions. It addresses the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence, including ensuring safety and security of AI systems, establishing a horizontal and technology-neutral framework, promoting responsible development and deployment of AI systems, creating a single market for AI, and strengthening the competitiveness of the EU industry in the global AI market. However, it does not explicitly mention the need to respect fundamental rights, accuracy of AI systems, and traceability of AI systems, which are mentioned in the task goal and restrictions.',\n",
       " 'rating': [5.0, 4.0],\n",
       " 'rationale': ['The text fully aligns with the task goal and restrictions. It clearly states the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence, including ensuring the trustworthy use of AI, creating a single market for AI applications and services, and ensuring safety, respect for fundamental rights, robustness, transparency, and fairness of AI systems.',\n",
       "  'The text mostly aligns with the task goal and restrictions. It addresses the reasons for and objectives of the proposal for a Regulation laying down harmonised rules on artificial intelligence, including ensuring safety and security of AI systems, establishing a horizontal and technology-neutral framework, promoting responsible development and deployment of AI systems, creating a single market for AI, and strengthening the competitiveness of the EU industry in the global AI market. However, it does not explicitly mention the need to respect fundamental rights, accuracy of AI systems, and traceability of AI systems, which are mentioned in the task goal and restrictions.']}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "preference_dataset[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 Argilla 进行人工反馈\n",
    "\n",
    "你可以直接使用 distilabel 创建的 AI 反馈，但我们已经看到，通过加入人工反馈可以提升 LLM 的质量。我们提供了一个 `to_argilla` 方法，它为 Argilla 创建了一个数据集，并附带了现成的定制元数据过滤器以及语义搜索，让你能够尽可能快速和有趣地提供人工反馈。你可以查看[ Argilla 文档](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html)来了解如何安装和运行。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果你正在使用 Docker 快速启动镜像或 Hugging Face Spaces 运行Argilla，你需要使用 URL 和 API_KEY 初始化 Argilla 客户端："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import argilla as rg\n",
    "\n",
    "# Replace api_url with the url to your HF Spaces URL if using Spaces\n",
    "# Replace api_key if you configured a custom API key\n",
    "rg.init(\n",
    "    api_url=\"http://localhost:6900\",\n",
    "    api_key=\"owner.apikey\",\n",
    "    workspace=\"admin\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一旦我们成功地制作出了偏好数据集，Argilla 的用户界面就是最适合我们用来查看和标记这些数据的东西。就像我们对指令数据集所做的那样，我们只需要把这个数据集变成 Argilla 能理解的格式，然后上传到 Argilla 上就可以开始工作了。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uploading the Preference Dataset\n",
    "preference_rg_dataset = preference_dataset.to_argilla()\n",
    "\n",
    "# Adding the context as a metadata property in the new Feedback dataset, as this\n",
    "# information will be useful later.\n",
    "for record_feedback, record_huggingface in zip(\n",
    "    preference_rg_dataset, preference_dataset\n",
    "):\n",
    "    record_feedback.metadata[\"context\"] = record_huggingface[\"context\"]\n",
    "\n",
    "preference_rg_dataset.push_to_argilla(name=f\"notus_AI_preference\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在Argilla用户界面中，我们可以看到输入（一个指令），以及 LLM 基于该指令创建的两个生成文本。\n",
    "\n",
    "![Preference dataset](https://huggingface.co/datasets/huggingface/cookbook-images/resolve/main/preference_dataset_notus_ui.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 结论\n",
    "\n",
    "总结一下，我们已经完成了一个使用 distilabel 的端到端示例。我们建立了一个推理端点，定义了一个从 PDF 提取信息的 distilabel 流水线，并创建和手动审查了从该输入生成的指令和偏好数据集。最终的偏好数据集非常适合进行微调，你可以使用 Argilla 的 ArgillaTrainer 轻松完成这一工作。如果你想深入了解，请查看以下资源：\n",
    "\n",
    "- [使用 ArgillaTrainer 训练模型](https://docs.argilla.io/en/latest/tutorials_and_integrations/tutorials/feedback/end2end_examples/train-model-006.html)\n",
    "- [Ⓜ️ 将 LLM 作为聊天助手进行监督式微调：Mistral 7B](https://docs.argilla.io/en/latest/tutorials_and_integrations/tutorials/feedback/training-llm-mistral-sft.html)\n",
    "- [🌠 通过优化检索和重排模型来改进 RAG](https://docs.argilla.io/en/latest/tutorials_and_integrations/tutorials/feedback/fine-tuning-sentencesimilarity-rag.html)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
